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Abstract:  We  investigate  performance  characteristics  of  secure  group  communication  systems  (GCSs)  in 
mobile  ad  hoc  networks  that  employ  intrusion  detection  techniques  for  dealing  with  insider  attacks 
tightly  coupled  with  rekeying  techniques  for  dealing  with  outsider  attacks.  The  objective  is  to  identify 
optimal  settings  including  the  best  intrusion  detection  interval  and  the  best  batch  rekey  interval  under 
which  the  system  lifetime  (mean  time  to  security  failure)  is  maximized  while  satisfying  performance 
requirements.  We  develop  a  mathematical  model  based  on  stochastic  Petri  net  ( SPN)  to  analyze 
tradeoffs  between  security  and  performance  properties,  when  given  a  set  of  parameter  values 
characterizing  operational  and  environmental  conditions  of  a  GCS  instrumented  with  intrusion 
detection  tightly  coupled  with  batch  rekeying.  We  compare  our  design  with  a  baseline  system  using 
intrusion  detection  integrated  with  individual  rekeying  to  demonstrate  the  effectiveness. 

Keywords —  Group  communication  systems,  mobile  ad  hoc  networks,  batch  rekeying,  intrusion  detection, 
stochastic  Petri  net,  group  key  management,  security,  performance  analysis. 

1.  Introduction 

Mobile  ad  hoc  networks  (MANETs)  are  known  to  have  high  security  vulnerability  because  of 
open  medium,  dynamically  changing  network  topology,  decentralized  decision-making  and  cooperation, 
lack  of  centralized  authority,  lack  of  resources  in  mobile  devices,  and  no  clear  line  of  defense  [2,  23,  33]. 
Two  types  of  security  threats  exist:  insider  and  outsider  attacks.  To  deal  with  outsider  attacks, 
prevention  techniques  such  as  authentication  and  encryption  have  been  widely  used.  To  deal  with  insider 
attacks,  intrusion  detection  systems  (IDS)  techniques  have  been  developed  for  detecting  compromised 
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nodes  and  possibly  removing  suspicious  nodes  from  the  group  formation  for  achieving  high- 
survivability  [33]. 

This  paper  concerns  dynamic  group  communication  systems  (GCSs)  in  MANETs  where 
members  of  a  logical  group  can  join  and  leave  the  group,  and,  while  they  are  in  the  same  group, 
cooperate  to  accomplish  assigned  mission  tasks,  as  in  military  battlefield  situations.  We  consider  design 
options  to  deal  with  both  insider  and  outsider  attacks  to  maintain  the  notion  of  secure  GCSs. 

The  commonly  accepted  practice  for  dealing  with  outsider  attacks  in  the  context  of  secure  GCSs 
is  to  maintain  a  secret  key,  so  called  the  group  key,  among  members.  The  group  key  may  be  rekeyed 
whenever  a  member  joins  or  leaves  (or  become  evicted).  The  secret  key  provides  confidentiality  and 
secrecy.  Various  rekeying  algorithms  for  secure  GCSs  have  been  investigated  widely  in  the  literature. 
The  most  primitive  form  of  rekeying  is  individual  rekeying  [21,  30],  that  is,  a  rekeying  operation  is 
performed  immediately  when  a  join  or  leave  event  occurs.  Batch  rekeying  [12,  24,  25,  31]  and  interval- 
based  distributed  rekeying  algorithms  [20]  have  been  proposed  for  efficient  rekeying  for  dynamic  peer 
groups,  with  the  tradeoff  of  weakening  confidentiality  as  a  result  of  delaying  the  update  of  the  group  key. 
Recently,  threshold-based  periodic  batch  rekeying  protocols  [6]  have  been  proposed  for  exploring  the 
tradeoff  between  secrecy  and  performance  of  the  system  with  the  objective  of  identifying  the  best  batch 
rekey  interval  to  maximize  performance  while  satisfying  security  properties.  This  paper  extends  our 
prior  work  in  threshold-based  periodic  batch  rekeying  algorithms  [6]  to  remove  the  assumption  of  a 
centralized  key  server  to  apply  to  MANETs.  We  also  incorporate  contributory  key  agreement  (CKA), 
i.e.,  each  group  member  contributes  to  rekeying  of  the  group  key,  to  deal  with  group  dynamics  in  a 
secure  GCS  setting  in  MANETs 

While  rekeying  techniques  provide  the  first  line  of  defense  against  outsider  attacks,  a  secure, 
mission-critical  GCS  application  demands  the  use  of  IDS  techniques  against  insider  attacks  to  ensure 
survivability.  In  the  literature,  IDS  techniques  for  dealing  with  insider  attacks  for  secure  GCSs  in 
MANETs  include  [2,  3,  10,  13,  15,  22,  27,  28,  29],  However,  these  IDS  techniques  have  been  studied 
separately  from  rekeying  techniques. 

In  this  paper,  we  integrate  batch  rekeying  with  IDS  in  GCSs  and  analyze  the  effect  of  integration 
in  terms  of  the  tradeoff  between  performance  and  security  properties  of  the  resulting  GCS.  Our 
observation  is  that  IDS  techniques  employed  in  the  context  of  secure  GCSs  must  be  tightly  coupled  with 
rekeying  techniques.  This  is  because  a  node  having  been  identified  by  IDS  as  suspicious  or 
compromised  can  be  evicted  immediately,  or  eventually.  The  former  requires  the  use  of  individual 


2 


rekeying,  while  the  latter  could  utilize  batch  rekeying  for  rekeying  efficiency.  The  decision  depends  on 
the  system’s  performance,  security,  vulnerability,  and  survivability  requirements.  Furthermore,  while 
IDS  activities  introduce  extra  communication  overhead  to  detect  insider  attacks,  batch  rekeying  reduces 
communication  cost  by  delaying  evictions  of  suspicious  members  detected  by  IDS  at  the  risk  of 
exposing  the  system  to  security  vulnerability. 

Our  goal  is  to  quantify  the  tradeoff  between  performance  and  security  properties  for  a  GCS  that 
incorporates  both  IDS  and  rekeying  techniques.  We  aim  to  determine  the  best  IDS  detection  interval  as 
well  as  batch  rekey  interval  under  which  security  is  maximized  while  performance  requirements  are 
satisfied.  Specifically,  we  consider  mean  time  to  security  failure  ( MTTSF)  as  the  security  metric  for 
secure  GCSs,  and  we  consider  the  service  response  time  per  group  operation  as  the  performance  metric. 
In  effect,  we  design  and  analyze  IDS  techniques  tightly  coupled  with  rekeying  techniques  applicable  to 
secure  GCSs  with  the  goal  to  identify  the  best  way  to  execute  these  protocols  based  on  the  tradeoff 
between  security  vs.  performance  metrics.  We  emphasize  that  the  threshold-based  periodic  batch 
rekeying  algorithms  considered  in  the  paper  could  degenerate  to  individual  rekeying  if  the  condition 
dictates  that  individual  rekeying  be  used  to  satisfy  the  security  requirement. 

This  paper  has  several  contributions  with  respect  to  GCSs  in  MANETs.  First,  we  consider  the 
incorporation  of  security  techniques  to  deal  with  both  outsider  and  insider  attacks  to  result  in  secure 
GCSs  in  MANETs,  i.e.,  batch  rekeying  for  dealing  with  outsider  attacks  and  IDS  for  dealing  with  insider 
attacks.  Second,  we  observe  and  evaluate  the  tradeoff  of  security  vs.  performance  properties  of  the 
resulting  GCS.  Third,  we  perform  mathematical  analysis  based  on  stochastic  Petri  net  (SPN)  to  describe 
the  resulting  GCS  to  quantitatively  identify  optimal  settings  (i.e.,  optimal  batch  rekeying  and  intrusion 
detection  intervals)  that  would  maximize  system  lifetime  (. MTTSF)  while  satisfying  performance 
requirements  (i.e.,  communication  latency  per  operation).  The  analytical  results  identified  allow  the 
GCS  to  dynamically  determine  the  best  settings  to  ran  IDS  and  rekeying  to  satisfy  the  system’s 
performance  and  security  requirements.  This  work  extends  from  our  preliminary  work  [9]  by  (a) 
considering  the  Group  Diffie-Hallman  (GDH)  algorithm  [23]  as  the  CKA  protocol  for  group  members  to 
generate  and  distribute  a  new  group  key  upon  a  group  membership  change  event  in  MANETs;  (b) 
considering  “hop-bits”  as  the  communication  cost  unit  for  quantifying  the  network  traffic  in  multi-hop 
MANETs  where  information  bits  may  travel  through  multiple  hops  to  reach  the  destination;  (c) 
introducing  new  security  and  attack  models  as  well  as  countermeasures  to  deal  with  insider  and  outsider 
security  attacks;  (d)  introducing  new  and  efficient  calculation  procedures  for  obtaining  MTTSF  and  the 
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service  response  time  for  performance  analysis;  and  (e)  significantly  expanding  the  analysis  including 
analyzing  the  effects  and  sensitivity  of  key  parameters  on  MTTSF  and  the  service  response  time 
performance  metrics. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  describes  the  background  of  IDS  and 
threshold-based  periodic  batch  rekeying,  as  well  as  contributory  key  agreement  protocols  applied  for 
rekeying  in  this  paper.  Section  3  gives  the  system  model  including  assumptions,  the  attack  and  security 
models,  and  evaluation  metrics.  Section  4  develops  a  mathematical  model  for  performance  analysis  and 
discusses  how  model  parameter  values  are  given  to  characterize  the  operational  conditions  and  how 
perfonnance/security  metrics  are  calculated.  Section  5  analyzes  the  results  obtained  from  evaluating  the 
mathematical  model  and  identifies  optimal  settings.  Finally,  Section  6  concludes  the  applicability  and 
outlines  some  future  research  areas. 


2.  Background 


2.1  IDS  Protocols 

We  consider  two  types  of  IDS  protocols  for  GCSs  in  MANETs:  host-based  IDS  vs.  voting-based 
IDS.  Host-based  IDS  is  well  studied  in  the  literature.  We  propose  voting-based  IDS  with  the  objective  to 
improve  the  system  survivability  against  collusion  of  compromised  nodes. 

In  host-based  IDS,  each  node  performs  local  detection  to  determine  if  a  neighboring  node  has 
been  compromised.  Standard  IDS  techniques  such  as  misuse  detection  (also  called  signature-based 
detection)  or  anomaly  detection  [17,  33]  can  be  used  to  implement  host-based  IDS  in  each  node.  Each 
node  evaluates  its  neighbors  based  on  information  collected,  mostly  route-related  and  traffic-related 
information  [13,  33].  Each  node  can  also  actively  collect  IDS  information  such  as  recording  if  a  packet 
sent  to  a  neighbor  is  not  forwarded  as  requested.  A  node  can  collect  data  either  at  the  MAC  layer  or 
application  layer  [13].  The  effectiveness  of  IDS  techniques  applied  (e.g.,  misuse  detection  or  anomaly 
detection)  for  host-based  IDS  is  measured  by  two  parameters,  namely,  the  false  negative  probability  (pi) 
and  false  positive  probability  (p2 ). 

We  propose  voting-based  IDS  for  improved  robustness  against  collusion.  Under  our  voting- 
based  IDS  scheme,  compromised  nodes  are  detected  based  on  majority  voting.  Specifically,  periodically 
a  node,  called  a  target  node,  would  be  evaluated  by  m  vote-participants  dynamically  selected.  If  the 
majority  decided  to  vote  against  the  target  node,  then  the  target  node  would  be  evicted  from  the  system. 
Our  voting-based  IDS  extends  from  the  idea  of  distributed  revocation  based  on  majority  voting  for 
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evicting  a  target  node  in  the  context  of  sensor  networks  [5]  and  intrusion  tolerance  techniques  based  on 
secret  sharing  and  threshold  cryptography  in  MANETs  [18,  34], 

We  consider  the  design  of  periodicity  to  allow  all  nodes  to  be  checked  periodically  for  intrusion 
detection  as  well  as  for  tolerance  of  collusion  of  compromised  nodes  in  MANETs.  We  characterize 
voting-based  IDS  by  two  parameters,  namely,  false  negative  probability  ( Pfn )  and  false  positive 
probability  (P/p).  These  two  parameters  are  calculated  based  on  (a)  the  host-based  false  negative  and 
positive  probabilities  (pi  and  p2 );  (b)  the  number  of  vote -participants  (m)  selected  to  vote  for  or  against 
a  target  node;  and  (c)  an  estimate  of  the  current  number  of  compromised  nodes  which  may  collude  to 
disrupt  the  service  of  the  system.  In  our  voting-based  IDS,  if  the  majority  of  m  voting-participants  (i.e., 
>\m/ 2])  casts  negative  votes  against  a  target  node,  the  target  node  is  diagnosed  as  compromised  and  is 
labeled  “evicted”  from  the  system.  Voting-based  IDS  is  entirely  distributed  and  each  node  determines  its 
vote  based  on  host-based  IDS  techniques.  The  voting-based  IDS  protocol  performs  this  eviction  process 
periodically.  At  the  beginning  of  a  detection  interval,  each  node  would  be  evaluated  by  m  vote- 
participants;  votes  are  distributed  and  tallied  to  decide  the  fate  of  the  target  node. 

For  the  selection  of  m  vote-participants  in  voting-based  IDS,  each  node  periodically  exchanges 
its  routing  information,  location,  and  id  with  its  neighboring  nodes.  If  a  compromised  node  fakes  its  id 
or  location,  it  increases  its  chance  of  being  detected  by  host-based  IDS  preinstalled  on  each  node.  With 
respect  to  a  target  node,  nodes  that  are  Hnb(m)- hop  away  are  candidates  as  vote-participants  where 
H„b(m)  is  a  design  parameter.  A  coordinator  is  selected  randomly  so  that  the  adversaries  will  not  have 
specific  targets  to  launch  their  attacks.  We  add  randomness  to  the  coordinator  selection  process  by 
introducing  a  hashing  function  that  takes  in  the  id  of  a  node-  concatenated  with  the  current  location  of 
the  node  as  the  hash  key.  The  node  with  the  smallest  returned  hash  value  would  then  become  the 
coordinator.  Since  candidate  nodes  know  each  other’s  id  and  location,  they  can  independently  execute 
the  hash  function  to  determine  which  node  should  be  the  coordinator.  The  coordinator  then  selects  m 
nodes  randomly  (including  itself),  and  broadcasts  this  list  of  m  selected  vote-participants  to  all  group 
members.  After  m  vote-participants  for  a  target  node  are  selected  this  way,  each  vote-participant 
independently  votes  for  or  against  the  target  node  by  disseminating  its  vote  to  all  group  members.  Vote 
authenticity  is  achieved  via  preloaded  public/private  key  pairs.  All  group  members  know  who  m  vote- 
participants  are,  and,  based  on  votes  received,  can  determine  whether  or  not  a  target  node  is  to  be 
evicted.  Under  batch  rekeying,  all  evicted  nodes  along  with  newly  join  and  leave  nodes  will  be 
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processed  at  the  beginning  of  the  next  batch  interval  and  a  new  group  key  will  be  generated  based  on 
contributory  key  agreement  among  current  group  members. 

2.2  Rekeying  Protocols 

We  consider  three  rekeying  protocols  for  GCSs  in  MANETs: 

•  Individual  rekeying :  A  rekeying  is  performed  right  after  each  join/leave/eviction  request. 

•  Trusted  and  Untrusted  Double  Threshold-based  rekeying  with  CKA  (TAUDT-C):  A  rekeying  is 
performed  after  a  threshold  (kl,  kl)  is  reached,  where  kl  is  the  number  of  requests  from  trusted  nodes 
(i.e.,  trusted  join  nodes  plus  trusted  leave  nodes)  and  l<2  is  the  number  of  requests  due  to  evictions  for 
the  nodes  detected  by  IDS  as  compromised  in  the  system.  That  is,  when  either  kl  or  k2  is  reached,  a 
rekeying  operation  based  on  CKA  is  performed.  This  protocol  extends  TAUDT  in  [6]. 

•  Join  and  Leave  Double  Threshold-based  rekeying  with  CKA  (JALDT-C):  A  rekeying  is  performed 
after  a  threshold  (kl,  k2)  is  reached,  where  kl  is  the  number  of  requests  from  join  nodes  (i.e.,  trusted 
join  nodes)  and  k2  is  the  number  requests  from  trusted  leave  nodes  plus  forced  evictions  for  the  nodes 
detected  by  IDS  as  compromised  in  the  system.  This  protocol  extends  JALDT  in  [6], 

TAUDT-C  is  based  on  separating  rekeying  operations  into  “trusted”  and  “untrusted”  groups, 
whereas  JALDT-C  is  based  on  separating  rekeying  operations  into  “join”  and  “leave”  groups.  We 
conceive  TAUDT-C  as  the  best  model  to  deal  with  security  attacks  since  it  separates  untrusted  nodes 
from  trusted  ones,  thus  making  both  thresholds  effective.  JALDT-C  can  be  considered  as  a  baseline 
model  against  which  TAUDT-C  is  compared.  Another  possible  rekey  protocol  conceivably  is  based  on 
three  thresholds  by  separating  rekey  operations  into  “join,”  “trusted  leave”  and  “untrusted  leave”  groups. 
We  believe  it  will  not  be  as  effective  as  TAUDT-C  since  it  may  unnecessarily  separate  “trusted” 
operations  into  two  groups,  so  neither  of  the  two  “trusted”  thresholds  would  be  effective  compared  with 
the  “untrusted”  threshold.  Thus,  in  this  work  we  will  only  consider  double  threshold-based  batch 
rekeying  protocols  along  with  individual  rekeying.  Here  we  note  that  TAUDT-C  and  JALDT-C  extend 
TAUDT  and  JALDT  developed  in  [6]  by  utilizing  a  CKA  protocol  for  distributed  control  and  removing  a 
single  point  of  failure  in  MANETs.  For  brevity,  we  will  just  call  them  TAUDT  and  JALDT  in  this  paper. 

Without  loss  of  generality,  this  paper  considers  GDH.3  (called  GDH  for  brevity)  [23]  as  the 
CKA  protocol  for  secret  key  generation.  Other  than  GDH,  other  distributed  CKA  protocols  such  as 
TDGH  [35]  and  SEGK  [36]  can  be  used  for  implementing  rekeying  in  our  approach.  Below  we  briefly 
explain  how  GDH  works. 
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Stage  T.upflow  Mt  -»  M2  -»  •••  ->  Mn_2  -»  Mn_  1 
message  size  bGDH  bGDH  bGDH  —  bGDH(n  2) 

Stage  2:  broadcast  Mn_t  ->  Mt  where  i  ^  n  —  1 
message  size  bGDH  —  H  x  bGDH 

Stage  3:  response  Mt  where  i  =£  n  ->  Mn 
message  size  bGDH  from  each  Mt  —  H  x  bGDH(n  —  1) 

Stage  4:  broadcast  Mn  -*  Mt  where  i  A  n 
message  size  bGDH(n  —  1)  intermediate  values  —  H  x  bGDH(n  —  1) 

Total  communication  cost  =  nbGDH(2H  +  1)  —  bGDH(H  +  2) 

Figure  1:  Message  Size  Requirement  in  GDH. 

GDH  comprises  four  stages  [25],  Each  participant  M,  shares  a  common  base  a  and  keeps  its 
secret  share  TV,.  The  first  stage  collects  contributions  from  all  group  members,  Mj,  M2...,  Mn. 
Specifically,  Mi  raises  a  to  the  power  of  TV/,  performing  one  exponential  computation  to  generate  aN1 , 
M2  computes  aN1  m  by  raising  aN1  to  the  power  of  N2,  and  so  on  until  Mn_t  computes  aNL"  N"~1 .  After 
processing  the  upflow  message,  M„_;  obtains  lfce[1'n— and  broadcasts  this  value  in  the  second 
stage  to  all  other  participants.  In  the  third  stage,  every  M,  factors  out  its  own  exponent  and  forwards  the 
result  to  Mn.  In  the  final  stage,  Mn  collects  all  inputs  from  all  other  participants,  raises  every  one  of  them 
to  the  power  of  TV,,  and  broadcasts  the  resulting  n- 1  values  to  the  rest  of  the  group.  Every  M,  receives  this 
message  in  the  form  of  a^Wfclk€f1,n_1^  n  and  can  easily  generate  the  intended  secret  key  Kn. 

Figure  1  summarizes  the  number  of  hop-bits  (i.e.,  bits  multiplied  by  the  number  of  hops  these 
bits  travel)  required  in  each  stage  of  GDH,  where  n  is  the  number  of  participants,  boDH  is  the  size  of  each 
intermediate  value,  H  is  the  number  of  hops  when  operational  area  (A)  is  calculated  as  a  circle  based  on 
a  radius  (r)  with  A  =  r'n.  As  shown  in  Figure  1,  stages  1  and  3  are  perfonned  using  unicast,  while  stages 
2  and  4  employ  broadcast.  We  apply  different  number  of  hops  for  unicast  and  broadcast  in  each  stage. 
In  stage  1 ,  we  assume  each  node  can  be  reached  within  one  hop  so  that  it  can  pass  a  message  to  the  next 
node  in  only  one  hop.  In  stages  2  and  4,  a  message  is  broadcast  to  all  group  members,  thus  taking  the 
average  hop  distance  separating  any  two  nodes  into  consideration.  In  stage  3,  for  simplicity  we  assume 
that  all  members  except  the  sender  are  located  near  the  boundary  of  the  operational  area  and  the 

sender  broadcasts  the  message  at  the  center  of  the  operational  area.  The  calculation  of  the  time  taken  to 
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perform  a  rekeying  operation  due  to  a  join/leave/eviction  event  based  on  GDH  will  be  explained  later  in 
Section  4.1. 


3.  System  Model 


3.1  Assumptions 

We  assume  that  the  GCS  is  in  a  wireless  MANET  environment  in  which  there  is  no  centralized 
key  server.  Each  node  is  preloaded  with  private/public  key  pairs  of  all  other  group  members  for 
authentication  purposes.  The  group  key  is  rekeyed  by  running  a  CKA  protocol,  such  as  GDH,  as  in 
MANETs  with  no  centralized  trust  entity  to  generate  and  disseminate  the  group  key. 

We  assume  that  threshold-based  periodic  batch  rekeying  is  utilized  in  resource-constrained 
MANETs  to  alleviate  rekeying  overheads  in  terms  of  the  communication  cost  incurred  due  to 
join/leave/eviction  requests.  We  assume  that  a  user  cannot  join  the  group  without  authorization.  Thus, 
only  “trusted”  join  is  allowed.  A  leave,  on  the  other  hand,  may  be  “trusted”  or  “untrusted.”  A  leave  is 
trusted  if  it  is  issued  by  a  user  that  voluntarily  leaves  the  group.  A  leave  is  untrusted  if  the  leave  is 
caused  by  eviction  of  a  detected  compromised  node.  If  rekeying  is  not  performed  immediately  after  an 
untrusted  leave,  the  “to  be  evicted”  node  may  cause  harm  to  the  system  since  it  still  possesses  the  group 
key. 

The  group  members  of  the  proposed  GCS  in  MANETs  are  assumed  to  be  spread  over  a 
geographical  area  (A).  The  workload  and  operational  conditions  of  a  GCS  in  MANETs  can  be 
characterized  by  a  set  of  model  parameters.  We  assume  that  the  inter-arrival  times  of  trusted  join  and 
leave  requests  are  exponentially  distributed  with  their  rates  being  A  and  //,  respectively.  The  inter¬ 
arrival  time  of  data  packets  issued  by  a  node  for  group  communication  is  also  assumed  to  be 
exponentially  distributed  with  rate  Aq.  The  assumption  of  exponential  distribution  can  be  relaxed  since 
the  SPN  performance  model  developed  is  capable  of  allowing  any  general  distribution  for  a  transition 
time.  We  assume  that  the  time  to  perform  a  rekeying  operation  upon  a  membership  change  event  (i.e., 
join  or  leave  event)  or  a  forced  eviction  is  measured  based  on  GDH  [25,  26]  to  realize  distributed  key 
management  in  MANETs. 

We  assume  that  inside  attackers  will  attempt  to  compromise  nodes  with  a  variable  rate  depending 
on  the  number  of  compromised  nodes  in  the  system.  We  use  the  linear  time  attacker  function  to  model 
the  attacker’s  behaviors,  considering  the  possibility  of  collusion  of  compromised  nodes.  Later  in  Section 
4.1,  we  will  explain  how  to  parameterize  the  linear  time  attacker  function.  Compromised  nodes  are 
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periodically  detected  by  IDS  with  false  positive  and  false  negative  possibilities.  We  assume  that  IDS 
will  perform  its  function  periodically.  The  detection  interval  is  dynamically  adjusted  in  response  to  the 
accumulated  number  of  intrusion  incidents  that  have  been  detected  in  the  system.  Similar  to  the  attacker 
behavior  model  above,  we  use  a  linear  periodic  detection  function  to  model  IDS  detection  activities 
which  increase  linearly  with  the  number  of  compromised  nodes  detected.  Later  in  Section  4.1  we  will 
also  parameterize  the  linear  periodic  detection  function. 

We  assume  that  view  synchrony  is  guaranteed  [20]  in  our  GCS,  which  ensures  that  messages  are 
delivered  reliably  and  in  proper  order  under  the  same  group  membership  view.  We  assume  that  each 
node  has  its  own  IDS  preinstalled  to  perform  intrusion  detection  activities.  We  assume  that  our  GCS 
enters  a  security  failure  state  when  one  of  the  two  conditions  stated  below  is  true: 

•  Condition  Cl:  a  compromised  member,  either  detected  or  not,  requests  and  subsequently  obtains 
data  using  the  group  key.  The  system  is  in  a  failure  state  because  data  have  been  leaked  out  to  a 
compromised  node,  leading  the  loss  of  system  integrity  [16]  in  a  security  sense. 

•  Condition  C2:  more  than  1/3  of  member  nodes  are  compromised  by  IDS.  We  assume  the 
Byzantine  failure  model  [11]  such  that  when  more  than  1/3  of  member  nodes  are  compromised,  the 
system  fails  because  of  loss  of  availability  [16]  of  system  service. 

We  note  that  Condition  Cl  reflects  false  negatives.  On  the  other  hand,  Condition  C2  reflects 
false  positives.  That  is,  when  good  nodes  are  falsely  identified  as  bad  nodes  and  become  evicted,  the 
total  node  population  reduces,  so  is  the  ratio  of  good  nodes  vs.  bad  nodes.  Consequently,  it  increases  the 
possibility  of  C2  being  satisfied,  thereby  causing  a  security  failure. 

After  a  member  node  is  detected  as  compromised  by  IDS,  it  can  still  stay  in  the  system  if  a  batch 
rekeying  protocol  is  used.  This  may  cause  system  failure  based  on  Condition  Cl  defined  above.  After  a 
node  is  detected  as  compromised,  it  will  be  evicted  for  security  reasons.  There  is  no  recovery 
mechanism  available  in  the  system  to  repair  a  compromised  member  and  make  it  a  trusted  member  node 
again.  Initially,  all  nodes  are  assumed  trusted. 

3.2  Attack  Model 

Host-based  IDS  and  voting-based  IDS  are  designed  to  deal  with  insider  attacks.  Outsider  attacks 
(e.g.,  disrupting  traffic,  modifying  data,  eavesdropping,  etc.)  are  dealt  with  by  group  key  encryption  and 
PKI-based  authentication.  Insider  attacks  are  due  to  compromised  nodes  disguised  as  legitimate 
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members  to  disrupt  the  system.  The  following  insider  attack  scenarios  are  considered  following  the 
attack  model  discussed  in  [14]: 

•  An  adversary  can  snoop  on  the  wireless  channel  to  learn  of  secret  information.  For  example,  the 
adversary  can  eavesdrop  messages  sent  by  vote -participants  against  a  target  node,  and  can  disseminate 
the  fake  vote  result  against  the  target  node  to  all  group  members. 

•  An  adversary  can  collude  with  other  compromised  nodes  so  as  to  more  efficiently  compromise 
another  node.  For  example,  an  adversary  can  cast  a  negative  vote  against  a  healthy  node  or  cast  a 
positive  vote  for  a  compromised  node. 

•  An  adversary  can  attempt  to  obtain  secret  information  by  communicating  with  other  group  members 
with  its  legitimate  group  key.  When  this  happens,  security  failure  condition  Cl  has  occurred. 

•  An  adversary  can  leak  the  legitimately  authorized  secret  information  to  outside  attackers.  Further,  an 
adversary  can  share  their  information  with  other  nodes  including  both  outside  attackers  and  inside 
attackers  to  more  easily  compromise  other  nodes. 

3.3  Security  Model 

Our  secure  GCS  in  MANETs  meets  four  requirements  in  the  presence  of  insider  and  outsider 
attacks:  confidentiality,  integrity,  availability,  and  authentication. 

Confidentiality  is  achieved  by  preserving  secrecy  properties  for  secure  GCSs.  Group  key  secrecy 
is  guaranteed  since  it  is  computationally  infeasible  for  an  adversary  to  discover  the  group  key  without 
knowing  all  intermediate  values  used  in  GDH.  While  backward  secrecy  is  preserved ,  forward  secrecy  is 
somewhat  relaxed  for  performance  gain  based  on  a  tradeoff  between  security  and  performance 
requirements.  Further,  key  independence  is  guaranteed  since  a  group  key  is  generated  using  GDH.  In 
[26],  these  secrecy  properties  of  GDH  have  already  proven. 

For  integrity,  we  use  MAC  (Message  Authentication  Code)  when  a  message  is  disseminated.  For 
example,  in  group  communications  between  members,  a  MAC  {Kg,  message)  is  used  using  the  group 
key  Kg  as  a  secret  key.  In  voting-based  IDS,  each  vote  from  a  vote-participant  is  disseminated  with  a 
MAC,  e.g.,  MAC  {Kg,  V)  where  V  refers  to  a  vote.  Thus,  it  is  impossible  for  an  outside  attacker  to 
modify  the  message  without  knowing  the  secret  key,  K(n  which  is  only  possessed  by  legitimate  members. 

Availability  is  maximized  in  our  scheme  by  introducing  adaptive  IDS  that  dynamically  adjusts  its 
intrusion  detection  interval  based  on  the  number  of  intrusions  that  have  been  detected  by  IDS  so  as  to 
maximize  MTTSF  of  the  system. 
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For  authenticity,  each  member  has  a  private  key  and  its  certified  public  key  is  available  for 
authentication  purposes.  When  a  new  member  joins  a  group,  the  new  member’s  identity  is  authenticated 
based  on  the  member  public/private  key  pair  by  applying  the  challenge/response  mechanism.  When  a 
group  key  is  generated  through  GDH,  the  source  authentication  of  a  participating  member  is  achieved  by 
using  the  private/public  key  pair  to  prevent  man-in-the-middle  attacks.  Moreover,  voting-based  IDS  also 
uses  preloaded  public/private  key  pairs  for  source  authenticity  when  a  vote  of  each  node  is  disseminated 
to  all  group  members. 

3.4  Metrics 

We  use  Mean  Time  to  Security  Failure  ( MTTSF)  to  measure  security  and  Service  Response  Time 
(R)  to  measure  performance  properties  of  our  GCS  in  MANETs  as  follows: 

•  Mean  Time  to  Security  Failure  (MTTSF):  This  metric  indicates  the  lifetime  of  the  GCS  before  it 
experiences  a  security  failure.  For  a  secure  GCS,  a  security  failure  occurs  when  either  Cl  or  C2 
defined  above  is  true.  As  a  security  metric,  a  lower  MTTSF  means  a  faster  loss  of  system  integrity  or 
loss  of  availability.  Therefore,  a  design  goal  is  to  maximize  MTTSF.  We  note  that  the  distribution  of 
security  failure,  and  the  probability  of  security  breach  are  also  proper  security  metrics  to  measure 
security  failure. 

•  Service  Response  Time  (R):  This  metric  refers  to  the  average  service  response  time  per  group 
communication  operation,  including  the  wireless  channel  contention  delay  and  transmission  delay 
when  a  group  communication  packet  is  transmitted.  This  metric  is  affected  by  the  traffic  intensity  of 
rekeying,  join/leave/eviction,  and  IDS  operations.  A  design  goal  is  to  find  optimal  settings  to  satisfy 
the  system  response  time  requirement  R  while  maximizing  MTTSF. 


4.  Performance  Model 


4.1  Stochastic  Petri  Net  Model 

We  develop  a  mathematical  model  based  on  SPN  as  shown  in  Figure  2  to  describe  the  behaviors 
of  a  GCS  instrumented  with  IDS  to  cope  with  insider  attacks,  as  well  as  batch  rekeying  to  deal  with 
outsider  attacks.  The  goal  is  to  identify  optimal  settings  to  maximize  MTTSF  while  satisfying  imposed 
performance  requirements  in  terms  of  R.  Table  1  summarizes  the  model  parameters  used. 
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Table  1:  Model  Parameters. 


Symbol 

Meaning 

A 

Operational  area  A  =  nr  (unit:  m  ) 

R 

Radius  of  an  operational  area  (m) 

H 

Average  number  of  hops  between  a  sender  and  a  receiver 

A 

Arrival  rate  of  join  requests  {sec1) 

F 

Arrival  rate  of  leave  requests  {sec1) 

Tws 

Initial  intrusion  detection  interval  {sec) 

Ac 

Initial  attacker  rate  {sec1) 

md 

Degree  of  compromised  nodes  that  have  been  detected  by  IDS 

D  (mj 

A  linear  detection  function  that  dynamically  returns  a  periodic  detection  rate 
based  on  md,  i.e.,  D  (md)  =  md(\ITjns)  (unit:  sec1) 

m  c 

Degree  of  compromised  nodes  currently  in  the  system 

A  (mj 

A  linear  attacker  function  based  on  mc  that  dynamically  returns  the  rate  at 
which  nodes  are  compromised,  i.e.,  A  (mc)  =  mcA  (unit:  sec1) 

Hnb(m) 

A  function  that  returns  the  hop  number  of  neighboring  nodes  based  on  m 

Aq 

Group  data  communication  rate  per  node  {sec1) 

Pi 

False  negative  probability  of  host-based  IDS 

p2 

False  positive  probability  of  host-based  IDS 

Tan 

Communication  time  for  broadcasting  a  rekey  message  {sec) 

bGDH 

Length  of  an  intermediate  value  in  applying  GDH  {bits) 

bGC 

Packet  size  for  group  communication  activities  {bits) 

lit 

Number  of  vote -participants  against  a  target  node 

BW 

Wireless  network  bandwidth  {Mbps) 

N im't 

Initial  number  of  member  nodes  in  the  system 

N 

Number  of  current  trusted  member  nodes 

MTTSF 

Mean  time  to  security  failure  (sec) 

R 

Average  service  response  time  per  group  communication  operation  (.sec) 

Aj 

Aggregate  group  join  rate  (sec1) 

al 

Aggregate  group  leave  rate  (sec1) 

Trts 

Transmission  delay  for  RTS  (request-to-send)  (sec) 

Tcts 

Transmission  delay  for  CTS  (clear-to-send)  (.sec) 

SIFS 

Short  inter-frame  space  (.sec) 

DIFS 

Distributed  inter-frame  space  (sec) 

Tslot 

Slot  time  in  random  backoff  (sec) 

EfCW 1 

Average  contention-window  size  (unit:  slot) 

T com 

Transmission  delay  for  a  packet  (sec) 

T„ 

Wireless  network  delay  including  channel  contention  time  (sec) 

Tc 

Channel  contention  delay  with  an  idle  channel  (sec) 

Toff 

Contention  delay  due  to  random  backoff  when  the  channel  is  busy  (sec) 

Q 

Success  packet  transmission  probability  without  collision  occurred 

^ packet 

Packet  arrival  rate  (sec1) 
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The  SPN  model  is  constructed  as  follows: 

•  We  use  places  to  classify  nodes.  Specifically  Tm  holds  trusted  members,  UCm  holds  compromised 
nodes  that  have  not  been  detected  by  IDS,  FDCm  holds  nodes  falsely  diagnosed  by  IDS  as 
compromised,  DCm  holds  compromised  nodes  that  have  been  detected  by  IDS,  TJ  holds  nodes  that 
have  issued  a  join  request,  TL  holds  nodes  that  have  issued  a  leave  request  and  SF  represents  a  system 
failure  state. 

•  A  “token”  in  our  SPN  model  represents  a  node  in  the  GCS.  The  population  of  each  type  of  nodes  is 
equal  to  the  number  of  tokens  in  the  corresponding  place.  A  token  in  place  SF,  however,  does  not 
represent  a  node  of  any  type,  but  just  represents  a  system  failure  state. 

•  We  use  transitions  to  model  events.  All  transitions  in  the  SPN  model  are  timed  transitions.  The  time 
taken  for  a  transition  to  fire  depends  on  the  event  associated  with  it.  For  example,  transition  T  RK 
stands  for  a  “rekeying”  event  so  the  rate  at  which  T  RK  fires  depends  on  the  time  taken  for  the  system 
to  perform  a  rekeying  operation  based  on  GDH.  As  another  example,  transitions  T  TJ  and  T  TL 
represent  join  and  leave  events,  respectively,  with  their  rates  depending  on  the  population  in  places  Tm 
and  UC,„,  that  is,  mark  (Tm)  +  mark  (UCm),  where  mark(X)  returns  the  number  of  tokens  held  in  place 
X. 
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•  We  associate  triggering  conditions  with  a  transition  to  model  conditions  under  which  an  event  would 
happen.  For  example,  the  triggering  condition  of  T  RK  depends  on  the  batch  rekeying  technique  used. 
For  individual  rekeying,  if  there  is  a  token  in  FDCm,  DCm,  TJ,  or  TL,  transition  T  RK  is  triggered.  For 
TAUDT  if  either  mark(TJ)  +  mark(TL)  reaches  kl,  or  mark(FDCm)  +  mark(DCm)  reaches  k2,  transition 
T  RK  is  triggered.  For  JALDT  if  either  mark(TJ)  reaches  kl  or  mark(TL)+  mark(FDCm)  +  mark(DCm) 
reaches  k2,  T  RK  fires.  Note  that  places  TJ  and  TL  are  used  to  explicitly  count  the  number  of  join  and 
leave  events  to  trigger  transition  T  RK  according  to  the  threshold-based  periodic  batch  rekeying 
protocol  selected  to  execute  by  the  system. 

•  We  move  nodes  (tokens)  from  one  place  to  another  place  when  an  event  occurs.  For  example,  after 
T  RK  fires,  all  pending  join/leave/eviction  operations  will  be  processed  by  the  system.  This  is 
modeled  by  flushing  tokens  in  places  FDCm,  DCm,  TJ,  and  TL.  This  is  achieved  by  specifying  the 
“multiplicity”  associated  with  an  arc.  For  example,  to  evict  all  nodes  in  DCm,  the  multiplicity  of  the 
arc  connecting  place  DC,,,  and  transition  T  RK  is  mark(DCm),  so  after  T  RK  fires  all  the  tokens 
(nodes)  in  place  DC,,,  are  flushed,  representing  that  rnark(DCm)  nodes  have  been  evicted  after  a 
rekeying  operation  is  done.  Simultaneously,  all  tokens  (nodes)  in  other  places  FDC„„  TJ,  and  TL  are 
removed  as  well. 

•  Initially,  all  members  are  trusted;  thus,  we  place  all  N  members  in  place  T,„  as  tokens.  Trusted 
members  may  become  compromised  because  of  insider  attacks  with  a  node-compromising  rate  A  (mc). 
This  is  modeled  by  firing  transition  T  CP  and  moving  one  token  at  a  time  (if  it  exists)  from  place  T,„ 
to  place  UCm.  Tokens  in  place  UCm  represent  compromised  but  undetected  member  nodes. 

•  We  consider  the  system  as  having  experienced  a  security  failure  when  data  are  leaked  out  to 
compromised  but  undetected  members,  i.e.,  due  to  condition  Cl.  Thus,  when  a  token  exists  in  place 
UCm,  the  system  is  considered  to  be  in  a  security  vulnerable  state.  A  compromised  but  undetected 
member  will  attempt  to  compromise  data  from  other  members  in  the  group.  Because  of  the  use  of 
host-based  IDS,  a  node  will  reply  to  such  a  request  only  if  it  could  not  identify  the  requesting  node  as 
compromised  with  the  per-node  false  negative  probability  pi.  This  is  modeled  by  associating 
transition  T  DRQ1  with  rate  pl*2q  *  mark  (UCm).  The  firing  of  transition  T  DRQ1  will  move  a  token 
into  place  SF,  at  which  point  we  regard  the  system  as  having  experienced  a  security  failure  due  to 
condition  Cl.  Specifically,  when  mark(SF)  >  0,  the  system  fails  due  to  condition  Cl,  where  mark(SF) 
returns  the  number  of  tokens  contained  in  place  SF. 
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•  A  compromised  node  in  place  UCm  may  be  detected  by  IDS  before  it  compromises  data  in  the  GCS. 
The  intrusion  detection  activity  of  the  system  is  modeled  by  the  detection  function  with  rate  D(nid). 
Whether  the  damage  has  been  done  by  a  compromised  node  before  the  compromised  node  is  detected 
depends  on  the  relative  magnitude  of  the  node-compromising  rate  ( A(mc ))  vs.  the  IDS  detection  rate 
(D(nid)).  When  transition  T  IDS  fires,  a  token  in  place  UCm  will  be  moved  to  place  DCm,  meaning 
that  a  compromised,  undetected  node  now  becomes  detected  by  IDS.  For  voting-based  IDS,  the 
transition  rate  of  T  IDS  is  mark(UCm)*D(md)*  (1  -P/„),  taking  into  consideration  of  the  false  negative 
probability  of  voting-based  IDS  used.  Voting-based  IDS  can  also  false-positively  identify  a  trusted 
member  node  as  compromised.  This  is  modeled  by  moving  a  trusted  member  in  place  Tm  to  place  DCm 
after  transition  T  FA  fires  with  rate  mark(T„,)*D  (mf)  *PfP ■  Here  we  note  that  voting-based  IDS 
parameters,  P/n  and  Pfp,  can  be  derived  based  on  pi  and  p2,  the  number  of  vote-participants  (m),  and 
the  current  number  of  compromised  nodes  which  may  collude  to  disrupt  the  service  of  the  system. 
Later  we  will  exemplify  how  to  do  the  parameterization  of  Pfn  and  Pfp  in  Section  4.1. 

•  After  a  node  is  detected  by  IDS  as  compromised,  it  is  evicted  when  a  rekeying  operation  is  invoked, 
triggered  either  by  kl  and  k2  in  a  double  threshold-based  periodic  batch  rekeying  protocol.  This  is 
modeled  by  firing  transition  T  RK  for  evicting  detected  compromised  members.  The  rate  at  which 
transition  T  RK  fires  (for  performing  a  rekeying  operation  based  on  GDH)  is  1/  Tcm.  Since  an  evicted 
node  (in  place  DCm)  does  not  leave  the  group  until  the  next  batch  rekey  interval  period,  it  introduces 
security  vulnerability.  We  model  this  data  leak-out  vulnerability  by  a  transition  T  DRQ2  connecting 
DCm  and  SF  with  rate  pi  *Aq  *  mark  ( DCm ).  The  firing  of  transition  T  DRQ2  will  move  a  token  into 
place  SF,  at  which  point  we  regard  the  system  as  having  experienced  a  security  failure  again  due  to 
condition  C 1 .  This  also  models  the  case  that  while  a  double  threshold-based  periodic  batch  rekeying 
algorithm  with  either  kl  >  1  or  k2  >  1  may  improve  rekeying  efficiency,  it  may  expose  the  system  to 
this  security  vulnerability. 

•  The  GCS  is  characterized  by  member  join  and  leave  events,  with  rates  of  A  and  //,  respectively.  This 
is  modeled  by  associating  transitions  T  TJ,  and  T  TL  with  these  two  rates. 

•  The  system  is  considered  as  experiencing  a  security  failure  if  either  one  of  the  two  security  failure 
conditions,  Cl  or  C2,  is  met.  This  is  modeled  by  making  the  system  enter  an  absorbing  state  when 
either  Cl  or  C2  is  true.  In  the  SPN  model,  this  is  achieved  by  associating  every  transition  in  the  SPN 
model  with  an  enabling  function  that  returns  false  (thus  disabling  the  transition  from  firing)  when 
either  Cl  or  C2  is  met,  and  true  otherwise.  In  our  model,  C 1  is  true  when  mark(SF)  >  0  representing 
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that  data  have  been  leaked  out  to  compromised  members;  C2  is  true  when  more  than  1/3  of  member 
nodes  are  compromised  as  indicated  in  Equation  (1)  below,  where  mark  (UC„)  returns  the  number  of 
compromised  but  undetected  nodes  in  the  system,  mark(DC„ ,)  returns  the  number  of  compromised 
and  detected  nodes  in  the  system,  mark(FDC,„)  returns  the  number  of  nodes  falsely  detected  as 
compromised  in  the  system,  and  mark(Tm)  returns  the  number  of  trusted  healthy  nodes  in  the  system. 

markiUCm)  +  mark(DCm)  1  ^ 

mark(Tm )  +  mark^UC^)  +  mark^FDC^  +  mark^DC^  3 


4.2  Parameterization 

Here  we  describe  the  parameterization  process,  i.e.,  how  to  give  model  parameters  proper  values 

reflecting  the  operational  and  environmental  conditions  of  the  system. 

•  A':  This  is  the  number  of  current  active  group  members  in  the  system.  This  number  evolves 
dynamically  as  the  system  evicts  compromised  nodes.  Since  a  node  leaves  the  group  voluntarily  with 
rate  u  and  joins  the  group  with  rate  2,  the  probability  that  a  node  is  active  in  the  group  is  X  /(X  +/u)  and 
the  probability  that  it  is  not  is  u  /(X  +ju).  Let  n  be  the  total  group  population  at  any  time  ( n=Ninn  at 
/=()).  Then,  N  =  n  X  /(X  +ju).  In  the  SPN  model,  we  initially  place  NunlX  /(X  +ft)  tokens  in  place  Tm.  As 
the  system  evolves,  N  is  obtained  with  mark  (Tm)  +  mark  ( UCm )  indicating  the  number  of  current 
active  group  members. 

•  Aj&  Al:  These  are  the  aggregate  join  and  leave  rates  of  group  nodes,  respectively.  They  are  also  the 
transitions  rates  associated  with  T  TJ  and  T  TL.  The  aggregate  leave  rate  Hi  is  equal  to  the  number  of 
active  group  members  ( N )  multiplied  by  per-node  join  rate  (it).  It  is  easy  to  see  that  this  aggregate 
leave  rate  Al  by  active  members  is  the  same  as  the  aggregate  join  rate  Aj  by  non-active  group 
members. 

•  Tcm:  This  is  the  communication  time  required  for  broadcasting  a  rekey  message.  The  reciprocal  of  Tcm 
is  the  rate  of  transition  T  RK.  Based  on  the  GDH  protocol  Tcm  can  be  calculated  as  follows: 


if(N  >  1) 
Tcm  = 

else 


NbGDH(2H  +  1)  - 

b GDH  Of +  2) 

BW 


b GDH 

BW 


(2) 
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Here  N  again  is  the  number  of  current  member  nodes,  bcDH  is  the  length  of  an  intermediate  value,  H  is 
a  constant  representing  the  number  of  hops  separating  any  two  nodes,  and  BW  is  the  wireless  network 
bandwidth  (Mbps)  in  MANETs. 

•  A  (mc):  This  is  an  attacker  function  that  returns  the  rate  at  which  a  node  is  compromised  in  the  system. 
It  is  also  the  rate  to  transition  T  CP  in  our  SPN  model.  Among  the  three  different  attacker  functions 
proposed  in  [8],  we  adopt  the  linear  time  attacker  function  in  this  paper  as  follows: 


where  mc  — 


A  linear  (P^c)  A;  ^  tnc 

mark(Tm)  +  mark(UCm) 


(3) 


mark(Tm) 

Here  Ac  is  a  base  compromising  rate  and  m,  represents  the  degree  of  compromised  nodes  currently  in 
the  system,  defined  by  the  ratio  of  N  to  the  number  of  good  nodes. 

D  This  is  a  detection  function  that  returns  the  rate  at  which  intrusion  detection  is  invoked, 

adjusted  based  on  the  accumulated  number  of  nodes  that  have  been  detected  by  IDS.  It  is  also  the  rate 
to  transition  T  IDS  in  our  SPN  model.  We  parameterize  it  based  on  linear  periodic  detection  as 
follows: 


1 

^linear (Bid)  Tf  ^  TH-d  (a\ 

1  IDS  vv 

,  Ninit 

where  = - — — ^ 

mark(Tm)  +  mark(U  Cm) 

Here  T nxs  is  a  base  intrusion  detection  interval  and  nid  represents  the  “degree”  of  nodes  that  have  been 
detected  by  IDS,  defined  by  the  ratio  of  Nuu,  to  N. 

*  Pfn  &  Pfp:  Pfr  is  the  probability  of  false  negatives,  calculated  by  the  number  of  compromised  nodes 
incorrectly  diagnosed  as  trusted  healthy  nodes  (i.e.,  detecting  a  bad  node  as  a  good  node)  over  the 
number  of  detected  nodes.  On  the  other  hand,  Pfp  is  the  probability  of  false  positives,  calculated  by  the 
number  of  normal  nodes  incorrectly  flagged  as  anomaly  over  the  number  of  detected  normal  nodes. 
We  consider  intrinsic  defect  of  host-based  IDS  in  each  node  as  well  as  collusion  of  compromised 
nodes  in  voting-based  IDS.  For  example,  a  compromised  participant  can  cast  a  negative  vote  against  a 
healthy  target  node  and  it  can  cast  a  positive  vote  for  a  malicious  node.  Equation  -5-  gives  the 
expressions  for  computing  Pfr  and  Pfp  as  follows: 
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In  Equation  5,  m  is  the  number  of  vote-participants  with  respect  to  a  target  node,  mark  (UCm)  is 
the  number  of  currently  compromised  nodes  and  mark  (Tm)  is  the  number  of  currently  healthy  nodes. 
Nodes  that  are  detected  compromised  (those  in  place  DCm )  cannot  participate  in  voting-based  IDS. 
Thus,  Pfp  is  obtained  when  the  majority  of  m  nodes  votes  against  a  good  node,  including  bad  nodes 
who  purposefully  cast  a  negative  vote  against  this  good  node,  and  good  nodes  who  mistakenly 
diagnose  this  good  node  as  a  bad  node  with  probability  p2,  resulting  in  the  healthy  node  being  evicted. 
On  the  other  hand,  Pf„  occurs  when  the  majority  of  m  nodes  votes  for  a  bad  node,  including  bad  nodes 
casting  a  positive  vote  against  this  bad  node,  and  good  nodes  who  incorrectly  diagnose  this  bad  node 
as  a  good  node  with  probability  pi.  Note  that  p  in  Equation  5  is  pi  when  calculating  Pfn  and  is  p2 
when  calculating  Pfp- 

4.3  Assessment  of  Performance  Metrics 


MTTSF  can  be  obtained  by  using  the  concept  of  mean  time  to  absorption  ( MTTA )  in  the  SPN 
model.  Specifically,  we  use  a  reward  assignment  such  that  a  reward  of  1  is  assigned  to  all  states  except 
absorbing  states  which  is  modeled  based  on  the  two  security  failure  conditions  (i.e.,  if  either  Cl  or  C2  is 
met,  the  system  fails).  Then  the  MTTA  or  the  MTTSF  of  the  system  is  simply  the  expected  accumulated 
reward  until  absorption,  £[T(oo)],  defined  as: 


£[r<»]  = 


Pi(t)dt 


(6) 


Here  S  denotes  the  set  of  all  states  except  the  absorbing  states,  r,  (reward)  is  1  for  those  states,  and  Pj(t) 
is  the  probability  of  state  i  at  time  t. 
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The  service  response  time  per  group  communication  packet  over  the  system’s  lifetime,  R.  may  be 
calculated  by  accumulating  wireless  network  delay  7),(t)  and  transmission  delay  Tcom( t)  over  MTTSF 
divided  by  MTTSF,  i.e., 


J0MTrSF[^(t)  +  Tcom(t)]dt 
MTTSF 


where 


(7) 


Tb  =  Tc  +  (Tc  +  Toff)  x  (1  /Q  -  1) 

Tc  =  Trts  +  SIFS  +  Tcts  +  SIFS  +  DIFS 
Toff  =  E[CW]xTslot 

Q _ g~Apacket*Tc 

„  _  (^GC  "b  back) 

com  ~  BW 

Here  Tcom  accounts  for  the  transmission  delay  for  a  group  communication  packet  being  delivered  to  the 
destination,  including  the  time  to  get  an  acknowledgement  back;  hoc  is  the  packet  size  (bits)  of  a  group 
communication  operation  and  back  is  the  packet  size  (bits)  for  an  acknowledgement.  Tb  accounts  for  the 
wireless  channel  contention  time  estimated  based  on  RTS  (request-to-sendyCFS  (clear-to-send) 
mechanisms  in  IEEE  802.11  with  DCF  (distributed  coordination  function).  The  contention  time  depends 
on  the  number  of  retries  for  securing  the  wireless  channel.  Each  trial  has  a  basic  delay  of  Tc  including 
the  transmission  time  of  the  RTS  and  CTS  packets  plus  the  artificial  delay  (SIFS  and  DIFS)  intrinsic  to 
IEEE  802.1 1.  If  a  trial  is  not  successful,  there  is  a  backoff  time  F//  before  the  next  trial  is  taken  place. 

While  in  practice  the  backoff  window  size  is  randomly  determined  over  a  range,  to  simplify  our 
analysis  we  assume  the  average  window  size,  denoted  by  E  [ CWJ ,  is  being  used  in  each  trial.  An 
attempt  is  successful  if  there  is  no  other  packet  being  transmitted  during  the  RTS/CTS  sequence.  Since 
the  overall  packet  rate  is  A packet,  assuming  packets  arrive  in  accordance  with  a  Poisson  process,  the 
probability  of  no  packet  arrival  during  Tc,  or  the  probability  of  no  collision,  is  given  by  exp (-ApacketTc). 
By  modeling  the  channel  contention  process  as  a  geometric  distribution  with  success  probability  Q,  the 
average  number  of  tries  before  a  successful  transmission  without  collision  is  obtained  is  given  by  1/Q. 
We  ignore  the  very  small  propagation  delay  in  calculating  Tb. 
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5.  Numerical  Data  and  Analysis 


We  present  numerical  results  obtained  from  evaluating  the  SPN  model  developed  and  provide 
physical  interpretations.  Our  objective  is  to  identify  optimal  settings  in  terms  of  optimal  double 
thresholds  k  I  and  k2  of  batch  rekeying  protocols  and  optimal  intrusion  detection  intervals  that  maximize 
MTTSF  while  satisfying  performance  requirements  in  terms  of  service  response  time  (R).  In  particular, 
based  on  the  identified  optimal  kl  and  k2  thresholds,  optimal  intrusion  detection  intervals  are  identified. 
We  compare  the  system  performance  of  double  threshold-based  periodic  batch  rekeying  protocols 
against  the  baseline  individual  rekeying  integrated  with  voting-based  IDS. 


Table  2:  Parameters  and  Default  Values. 


Parameter 

Value 

Parameter 

Value 

A 

1/(60*60  s) 

m 

5 

M 

1/(60*60*4  s) 

BW 

1  Mbps 

Tws 

30  -  9600  (5) 

N inti 

60 

Tstatus 

2  (s) 

D  (mu) 

Linear  to  m,i 

Ac 

1/(60*60*12  s) 

A  (mc) 

Linear  to  mc 

Aq 

l/(60*3  s) 

Trts 

0.0003  (5) 

Pi 

1  % 

Tcts 

0.0004  (5) 

p2 

1  % 

SIFS 

0.00002  (5) 

bGDH 

64  bits 

DIFS 

0.00005  (5) 

bcc 

800  bits 

Tsiot 

0.00005  (s) 

back 

32  bits 

E[CW] 

256 

Table  2  summarizes  default  parameter  values  for  the  base  reference  system  in  which  the  false 
negative  probability  (pi)  and  the  false  positive  probability  (p2)  of  host-based  IDS  are  set  to  1%  each 
since  in  general  less  than  1%  of  false  positive  or  false  negative  rate  is  deemed  acceptance,  reflecting  the 
presence  of  a  medium  to  high  quality  host-based  IDS.  The  group  communication  rate  (Aq)  is  set  to  once 
per  3  minutes.  The  base  compromising  rate  at  which  nodes  are  compromised  (Tc)  is  once  per  12  hours, 
reflecting  a  medium-high  level  of  attack  strength  by  the  attackers.  Later  we  will  vary  the  values  of  these 
key  parameters  to  analyze  their  effects  and  sensitivity  on  system  performance.  The  wireless  bandwidth 
(BW)  is  considered  limited  and  is  set  at  1Mbps.  The  ratio  of  join  to  leave  events  (A,:p)  is  set  to  4, 
reflecting  the  fact  that  nodes  join  a  group  much  faster  than  they  leave  a  group.  The  values  used  for  boon, 
bcc  and  back  are  set  to  reflect  the  number  of  information  bits  used  for  GDH  execution,  group 
communication  and  acknowledgement,  respectively.  The  values  used  for  Trts,  Tcts,  SIFS,  DIFS,  and 
Tsiot  are  based  on  DSSS  for  IEEE  802.11  as  reported  in  [1,  4],  The  number  of  vote  participants  (m)  in 
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voting-based  IDS  is  set  to  5  for  high  survivability.  Lastly,  Pfn  and  PfP  of  voting-based  IDS  while  not 
being  listed  here  are  to  be  calculated  based  on  Equation  5. 


— A—  k2  =  3 


)<  k2  =  4 


— *—  k2  =  5 


kl 


Figure  3:  Optimal  kl  and  k2  for  TAUDT  in  Figure  4:  Optimal  kl  and  k2  for  JALDT  in 
MTTSF.  MTTSF. 

5.1  Optimal  Double  Thresholds  (kl  and  kl) 

Figures  3  and  4  show  the  effect  of  varying  kl  and  k2  on  MTTSF  for  TAUDT  and  JALDT , 
respectively.  The  optimal  MTTSF  in  TAUDT  is  observed  at  (kl,  k2)  =  (4,  1),  as  shown  in  Figure  3.  We 
explain  why  the  optimal  (kl,  k2)  =  (4,  1)  under  TAUDT  below.  Recall  that  in  TAUDT,  kl  governs 
against  the  number  of  join/leave  nodes  (mark(TJ)  +  mark(TL))  while  k2  governs  against  the  number  of 
nodes  detected  as  compromised  (mark(FDCm)  +  mark(DCmJ).  As  k2  increases,  security  failure  due  to 
Condition  Cl  is  more  likely  to  occur  since  a  larger  k2  allows  more  detected  compromised  nodes  to  exist. 
Allowing  k2  larger  than  1  significantly  deteriorates  MTTSF.  Thus,  k2  is  optimized  at  1.  When  kl= 1,  the 
probability  that  rekeying  is  triggered  due  to  kl  is  relatively  high  compared  to  when  kl  >  I .  This  has  the 
effect  of  delaying  detected  compromised  nodes  (in  DCm )  to  be  removed,  which  degrades  MTTSF  again 
due  to  condition  Cl.  As  kl  increases,  the  probability  that  rekeying  is  triggered  due  to  k2  increases.  This 
has  the  effect  of  quickly  removing  detected  compromised  nodes,  which  increases  MTTSF  as  a  result. 
Lastly,  as  kl  increases  further,  not  only  nodes  in  DCm  but  also  nodes  in  FDCm  are  very  quickly  removed. 
This  has  the  effect  of  degrading  MTTSF  due  to  Condition  2.  We  also  note  that  when  l<2  is  greater  than  1, 
there  isn’t  much  sensitivity  of  MTTSF  on  k2  since  k2  governs  untrusted  members  directly  related  to 
security  failure. 
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The  optimal  MTTSF  in  JALDT  is  observed  at  (kl,  k2 )  =  (5,  2),  as  shown  in  Figure  4.  Recall  that 
in  JALDT  k2  governs  the  threshold  for  both  trusted  leave  and  untrusted  leave  requests,  while  in  TAUDT 
k2  only  governs  untrusted  leave  requests.  Consequently,  the  optimal  k2  is  at  2  in  JALDT  as  opposed  to 
the  optimal  l<2  at  1  in  TAUDT.  The  reason  that  the  optimal  kl  is  at  5  in  JALDT  is  that  kl= 5  (as  opposed 
to  4)  best  balances  the  probability  of  security  failure  due  to  Condition  1  vs.  Condition  2,  as  explained 
earlier,  since  kl  now  only  governs  join  operations. 

Figures  5  and  6  show  the  effect  of  kl  and  k2  on  the  service  response  time,  R.  The  trends  shown 
in  Figures  5  and  6  strikingly  reflect  the  overall  communication  cost  per  time  unit  (5)  vs.  kl  and  k2  (not 
shown  here  for  brevity).  In  Figure  5,  we  see  the  optimal  (kl ,  k2 )  is  at  (4,  1)  being  identical  to  that  in 
Figure  3.  In  Figure  6,  we  also  observe  that  the  optimal  (kl,  k2 )  is  at  (5,  2)  being  identical  to  that  in 
Figure  4.  The  existence  of  the  optimal  (kl,  kl)  setting  can  be  explained  in  a  similar  way  as  we  have 
done  for  Figures  3  and  4. 


Figure  5:  Optimal  kl  and  k2  for 
Service  Response  Time  R. 
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Figure  6:  Optimal  kl  and  k2  for  JALDT  in 
Service  Response  Time  R. 


5.2  Optimal  Intrusion  Detection  Intervals  ( TWs ) 


Here  we  analyze  optimal  intrusion  detection  intervals  (Tins)  based  on  optimal  double  thresholds 
kl  and  k2  identified,  that  is,  for  TAUDT,  (kl,  k2 )  =  (4,  1)  and  for  JALDT,  (kl,  kl)  =  (5,  2)  for  all  T ms 
ranges  respectively.  We  compare  system  performance  under  periodic  batch  rekeying  vs.  individual 
rekeying  and  show  that  batch  rekeying  under  optimal  settings  outperforms  individual  rekeying  when 
IDS  is  present. 
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Figure  7  shows  the  effect  of  three  different  periodic  batch  rekeying  protocols  on  MTTSF  and 
identifies  the  optimal  intrusion  detection  interval,  T ids.  We  observe  that  there  exists  an  optimal  TIDS  that 
maximizes  MTTSF.  In  general,  as  T ms  increases,  MTTSF  increases  until  its  optimal  T ids  is  reached,  and 
then  MTTSF  decreases  after  the  optimal  T ids-  The  reason  of  decreasing  MTTSF  after  reaching  the 
optimal  point  is  that  the  false  positive  probability  (P/p)  increases  as  T ids  decreases,  therefore  resulting  in 
more  nodes  being  falsely  identified  as  compromised  and  being  evicted  from  the  system.  Note  that  Pfp  is 
one  aspect  of  false  alarms  generated  by  IDS,  so  its  effect  is  increased  when  IDS  is  more  frequently 
triggered.  As  expected,  we  observe  that  the  baseline  individual  rekeying  perfonns  the  worst,  while 
TAUDT  perfonns  the  best  in  tenns  of  MTTSF  among  the  three.  Here  TAUDT  operates  at  the  optimal 
setting  ( kl ,  k2)  =  (4,  1)  as  identified  in  the  paper.  On  one  hand,  l<2=  I  allows  rekeying  to  be  triggered  as 
soon  as  possible  once  a  compromised  node  has  been  identified  for  eviction.  On  the  other  hand  kl=  4 
balances  the  probability  of  security  failure  due  to  Condition  1  vs.  Condition  2,  as  explained  earlier.  We 
note  that  individual  rekeying  perfonns  the  worst  because  the  probability  that  rekeying  is  triggered  due  to 
trusted  join/leave  is  relatively  high  compared  to  the  other  two  rekeying  protocols.  This  has  the  effect  of 
removing  detected  compromised  nodes  in  DCm  slowly  and  decreasing  MTTSF  due  to  Condition  1  The 
optimal  intrusion  detection  interval  is  identified  at  T ms  =  240  s  for  individual  rekeying,  and  480  s  for 
TAUDT  and  JALDT,  as  shown  in  Figure  7. 


Figure  7:  Optimal  TWs  in  MTTSF.  Figure  8:  Optimal  TIDS  in  R. 


Figure  8  shows  service  response  time  ( R )  vs.  intrusion  detection  interval  (T ids).  We  again 
observe  that  there  exists  an  optimal  T ms  that  minimizes  the  service  response  time  in  all  three  curves.  The 
reason  that  R  goes  up  as  Tjds  increases  past  the  optimal  point  is  that  a  larger  Tjds  leads  to  more  activities 
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during  the  IDS  period  because  more  bad  nodes  would  be  remaining  in  the  system  without  being  detected 
by  IDS.  These  bad  nodes  engage  in  group  communication,  status  exchange,  and  voting  activities  as  good 
nodes,  thereby  causing  a  higher  contention  of  the  wireless  channel  and  a  higher  service  response  time. 
On  the  other  hand,  when  Tjds  is  very  small,  the  communication  overhead  due  to  IDS  dominates  and 
consequently  R  is  also  high.  We  note  that,  however,  the  variation  in  R  is  small  overall  and  is  relatively 
insensitive  to  the  intrusion  detection  interval.  Among  the  three  curves  in  Figure  8,  we  again  observe  that 
individual  rekeying  performs  the  worst,  while  TAUDT  at  the  optimal  point  performs  the  best. 

A  systems  designer  can  use  the  results  obtained  here  to  identify  Tjds  that  can  optimize  system 
performance.  To  maximize  MTTSF,  Tjds  is  identified  as  480  s.  To  minimize  R,  Tjds  is  identified  as  600  s. 
However,  there  is  an  insignificant  response  time  difference  between  Tjds  =  480  s  and  Tjds  =  600  s.  Thus, 
the  optimal  Tjds  in  this  case  is  set  to  480  s  that  can  maximize  MTTSF  while  satisfying  the  service 
response  time  (R)  requirement. 

5.3  Sensitivity  Analysis 

In  this  section,  we  perform  sensitivity  analysis  to  test  the  sensitivity  of  MTTSF  and  R  vs.  Tjds 
with  respect  to  certain  key  parameters  including  Ac,  Aq,  and  (pi,  p2 ).  We  use  TAUDT  under  optimal  (kl , 
k2 )  as  the  base  case  since  it  has  been  identified  it  as  the  best  scheme  in  Section  5.2. 

=  once  per  12  hrs. 

♦  Ac*  ■  2  Ac*  ■  A  5  Ac*  X  10  Tic* 


Figure  9:  Sensitivity  of  MTTSF  vs.  Tms  with  respect  to  Ac. 

Figure  9  shows  the  sensitivity  of  MTTSF  vs.  Tjds  with  respect  to  the  compromising  rate  (Ac) 
which  varies  from  Ac*  to  10 Ac*  covering  an  order  of  magnitude  change.  We  observe  that  as  Ac  increases, 
MTTSF  decreases  because  a  higher  Ac  will  cause  more  compromised  nodes  to  be  present  in  the  system. 
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Consequently,  the  optimal  Tms  value  that  maximizes  MTTSF  decreases  because  more  compromised 
nodes  will  exist  as  kc  increases  and  the  system  will  need  to  execute  IDS  more  frequently  to  maximize 
MTTSF.  Nevertheless,  we  observe  that  the  optimal  Tms  value  that  maximizes  MTTSF  is  sensitive  to  kc 
only  when  the  order  of  magnitude  of  kc  changes  (e.g.,  when  its  value  changes  from  A*  to  10  Ac*)  but  is 
relatively  insensitive  to  kc  when  its  order  of  magnitude  remains  the  same  (e.g.,  when  its  value  changes 
from  Ac*  to  2 Ac*).  We  attribute  this  level  of  sensitivity  to  the  way  our  detection  function  (see  Equation  4) 
reacts  to  the  attacker  strength  (see  Equation  3)  linearly. 

=  once  per  3  min. 


Figure  10:  Sensitivity  oi  MTTSF  vs.  Tms  with  respect  to  Aq. 

Figure  10  shows  the  sensitivity  of  MTTSF  vs.  Tms  with  respect  to  the  group  communication  rate 
(Aq).  We  observe  that  when  ).q  is  low  so  the  data-leak  attack  is  not  performed  often,  the  positive  effect  of 
IDS  is  pronounced,  leading  to  a  high  MTTSF.  On  the  other  hand,  when  Aq  is  high  so  the  data-leak  attack 
is  frequent,  the  negative  effect  of  IDS  is  pronounced,  so  MTTSF  is  low.  We  also  observe  that  the  optimal 
Tids  becomes  smaller  as  kq  increases  because  the  system  prefers  removing  compromised  nodes  as  soon 
as  possible  so  that  compromised  nodes  would  not  have  a  chance  to  perform  data-leak  attacks.  Another 
observation  is  that  when  Tms  is  sufficiently  small,  e.g.,  Tms  <  120  s,  MTTSF  remains  about  the  same 
regardless  of  the  magnitude  of  kq.  This  is  because  when  IDS  is  being  invoked  too  frequently,  the  adverse 
effect  of  false  positives  dominates  the  positive  effect  of  IDS.  Lastly  we  observe  that  the  optimal  Tms 
value  is  sensitive  to  kq  even  when  its  order  of  magnitude  remains  the  same  (e.g.,  when  kq  value  changes 
from  Aq*  to  2 Aq*).  We  attribute  this  level  of  sensitivity  to  the  way  voting-based  IDS  reacts  to  data-leak 
attacks  (i.e.,  to  avoid  Condition  Cl  from  being  satisfied). 
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Figure  11:  Sensitivity  of  MTTSF  vs.  Tws  with  respect  to  {pl,p2 ). 

In  Figure  11,  we  check  the  sensitivity  of  MTTSF  vs.  TIDS  with  respect  to  host-based  IDS  false 
negative  and  false  positive  probabilities,  i.e.,  {pi,  p2).  We  see  that  when  the  IDS  quality  is  low  as 
indicated  by  high  (pi,  p2)  values  (e.g.,  the  last  curve  on  Figure  1 1),  MTTSF  is  low,  in  which  case  a  large 
Tins  would  be  preferred  because  the  system  can  delay  generating  false  positives  by  the  low-quality  IDS 
as  much  as  possible  with  a  long  IDS  interval.  We  observe  that  in  general  the  optimal  T ids  value  is  very 
sensitive  to  {pi,  p2)  even  when  their  order  of  magnitude  is  the  same  (e.g.,  when  their  values  change 
from  0.01  to  0.03).  We  attribute  this  acute  sensitivity  to  the  way  voting-based  IDS  reacts  to  host-based 
IDS  false  negative  and  false  positive  probabilities  by  acutely  adjusting  the  detection  interval  to 
maximize  MTTSF. 

Xc*  =  once  per  12  hrs.  A,q*  =  once  per  3  min. 


Figure  12:  Sensitivity  of  R  vs.  TIDS  with  respect  Figure  13:  Sensitivity  of  R  vs.  TIDS  with  respect 
to  to 
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We  repeat  the  same  sensitivity  analysis  to  test  the  effects  of  Ac,  Aq,  and  (pi,  pi)  on  R  vs.  Tjds.  The 
results  are  shown  in  Figures  12-14  for  Ac,  Aq,  and  (pi,  p2 ),  respectively.  In  Figure  12,  we  observe  that  as 
Ac  increases,  R  increases  due  to  more  compromised  nodes  being  evicted  and  thus  there  is  more  traffic 
being  generated  for  rekeying.  However,  the  optimal  Tjds  that  minimizes  R  is  relatively  insensitive  to  Ac 
because  the  traffic  generated  for  rekeying  does  not  dominate  other  sources  of  traffic  in  the  system.  In 
Figure  13,  we  observe  that  as  Aq  increases,  R  increases  due  to  a  higher  level  of  group  communication 
activities.  To  minimize  R  in  the  presence  of  a  high  Aq  value,  the  system  would  use  a  small  Tjds  so  as  to 
more  quickly  detect  and  evict  truly  or  falsely  identified  compromised  nodes  from  the  system  to  reduce 
the  total  population  and  the  net  traffic.  We  observe  that  the  optimal  Tjds  that  minimizes  R  is  sensitive  to 
Aq  values  in  the  same  order  of  magnitude  because  the  system  must  acutely  balance  the  extra  traffic 
introduced  due  to  more  frequent  IDS  and  eviction  activities  (as  a  result  of  the  use  of  a  smaller  Tjds)  vs. 
the  traffic  being  reduced  due  to  less  group  communication  and  status  exchange  activities  (as  a  result  of 
the  decreasing  population  because  of  fast  eviction).  In  Figure  14,  we  first  observe  that  as  (pi,  p2)  values 
increase,  R  decreases.  This  is  because  low-quality  IDS  characterized  by  high  (pi,  p2)  values  will  likely 
evict  compromised  nodes  (albeit  mostly  falsely-identified)  faster  than  high-quality  IDS  characterized  by 
low  (pi,  p2)  values.  As  a  result,  the  node  population  and  group  communication  traffic  in  the  system  will 
be  greatly  reduced.  Consequently,  to  minimize  R  in  the  presence  of  high  (pi,  p2)  values,  the  system 
would  use  a  small  Tjds  to  further  accelerate  the  reduction  of  the  total  population  and  the  net  traffic.  Here 
we  observe  that  the  optimal  Tjds  that  minimizes  R  is  sensitive  to  (pi,  p2)  values  in  the  same  order  of 
magnitude.  We  again  attribute  this  level  of  sensitivity  to  the  system’s  ability  to  acutely  determine  the 
optimal  Tjds  that  can  best  balance  the  traffic  sources  as  (pi,  p2)  varies. 
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Figure  14:  Sensitivity  of  R  vs.  TIDS  with  respect  to  (pl,p2). 


6.  Applicability 

To  apply  the  analysis  results  obtained  in  the  paper,  one  can  summarize  findings  into  a  table 
listing  optimal  batch  rekeying  and  intrusion  detection  intervals  covering  a  range  of  parameter  values 
characterizing  perceivable  operational  and  environmental  conditions.  Then,  at  runtime,  the  system  can 
perform  a  table  lookup  operation  to  select  the  best  batch  rekey  and  intrusion  detection  intervals  based  on 
statistical  information  collected  dynamically. 

While  we  have  exemplified  with  batch  rekeying  and  host-based/voting-based  IDS  as  the 
rekeying  and  IDS  algorithms  in  this  paper,  the  mathematical  model  developed  is  generally  applicable  to 
other  types  of  rekeying  and  IDS  algorithms.  The  changes  can  be  reflected  by  means  of  parameterization 
(giving  proper  model  parameter  values).  For  example,  if  we  consider  a  network  environment  in  which  a 
centralized  key  server  and  network-based  IDS  are  employed,  we  can  simply  replace  PfP  and  Pfn  with  pi 
and  p2.  If  we  consider  other  rekeying  algorithms,  centralized  or  decentralized,  or  distributed  key 
management  protocols,  all  one  has  to  do  is  to  redefine  the  rekeying  conditions  based  on  the  state 
information  provided  in  the  SPN  model,  e.g.,  based  on  the  number  of  join/leave/eviction  operations  in  a 
state.  The  performance  metric  calculation  and  methodology  developed  remain  same  for  identifying 
optimal  design  conditions  that  maximize  MTTSF. 


7.  Conclusion  and  Future  Work 
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In  this  paper,  we  investigated  the  design  of  integrating  intrusion  detection  with  batch  rekeying  to 
cope  with  both  outsider  and  insider  attacks  for  GCSs  in  MANETs,  and  analyzed  the  tradeoff  between 
security  and  performance  properties  of  the  resulting  GCS  due  to  the  use  of  these  two  protocols.  We 
showed  that  there  exist  optimal  settings  in  terms  of  batch  rekey  intervals  (kl  and  k2)  and  intrusion 
detection  intervals  under  which  the  system  lifetime  (in  terms  of  MTTSF)  is  maximized  while 
performance  requirements  (in  terms  of  service  response  time)  is  satisfied. 

The  current  work  considers  the  case  in  which  the  node  density  is  high,  and  thus  all  nodes  are  in 
one  group  and  will  not  be  partitioned  in  MANETs.  In  the  future,  we  plan  to  extend  this  work  to  consider 
the  case  in  which  a  GCS  may  be  partitioned  due  to  mobility  or  changes  of  transmission  range  because  of 
energy  depletion.  We  also  plan  to  integrate  IDS  and  batch  rekeying  with  hierarchical  key  management 
[7]  for  achieving  high  scalability,  configurability  and  survivability  for  GCSs  in  MANETs. 
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